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A. A FIEST INSIGHT 


ПА systolic system is a netwerk of processors which rkyth- 
mically compute and pass data through the system. 
Physiologists use the word "systole" to refer to tne 
rhythrically recurrent contraction of the heart and 
arteries which pulses blocd through the body. In a 
systolic computing system, the function of a processcr is 
analogcus to that cf the heart. Every processor regularly 
pumps data in and out, each time performing some snort 
computation, so that a regular flow of data is kept in the 
network." 

п...паїгіх computations can be pipelined elegantly and 
efficiently on systolic networks having an array 
structure." 


"...Special purpose hardware devices based оп systolic 


arrays can te ruilt inexpensively using the VISI 
technclogy." 
These are parts of the abstract of the paper where 
Professors H.T. Kung a a e Leiserson have first 
presented the concert of systolic arrays [Ref. 1]. They 


fully describe the ccncept and context to which the tera 
"Systclic array" applies. 

Several authors have presented papers on systolic arrays 
in recent years, but this subject is very new and everytning 
Gates Lack to 1978. 


B.  SYSICLIC ARRAYS: THE BASIC PRINCIPLE 


A systolic system consists of a set of interconnected 


cells, each capable of performing some simple operaticn. 


Information flows between cells in a pipelined fashicn, апа 
communication with the outside world occurs only at 
"boundary cells". 

The usual computational tasks are basically divided into 
two categories--computing bound and input/output (1/0) 
bound. If the tctal rumber of computing operations is much 
larger than that of 1/0 oferaticns, then we have a computing 
round task. Systolic arrays are useful to speed up this 
type cf task. The array is able to use each input element a 
number cf times thus achieving a high computational 
throughput without ircreasing memory bandwidth. jRef. 2] 


С. DEVISED ARCHITECTURES 


A number of architectures have already been propcsed for 
solving the traditicnal problems. These are linear arrays 
for  pgerfcrming matrix-vector multipiication and finding 
soluticn of triangular iinear systems; two-dinensional 
arrays for matrix-matrix multiplication/addition, ieast 


Squares sclution and IU factorization,etc (Ref. 3]. 


D.  HARWARE EXPLORATCBY DEVELOEIMENT 


Because this subject is new, much of the develcment is 
stiil in an exploratciy stage. In order to further evaluate 
the hardware potential of systolic arrays, ТЕК/Е5І has 
deveicped a processor called Fhoenix that has already been 
used tc provide ‘two-dimensional FIR filters for image 
processing [kef. 3: page 3]. To evaluate systolic algo- 
rithms and architectures, the Naval Ocean System Center has 
develcped a reconfigurable cne-two dimensional systolic 
array with 64-processing elements [Ref. 4]. This systolic 
testked processor can be reconfigured under software ccntrol 
to perform as a rectangular, hexagonal, or linear systolic 


array. The main goal of its design was to achieve 


ге 


flexibility in algorithm evaluation, and its cells are much 
slower than those used in the TRW/ESL Phoenix processor or a 


possikle implementaticn in a VISI chip. 


E. WHY NOT TO EXPLORE IT WITH SOFTWARE? 


During the preliminary investigations in this field, the 
approach was to select a case study and try to go deeper in 
the process of understanding the way a particular algorithm 
is mapped onto a systolic architecture. It was discovered 
that it is not always straightforward to understand the 
mapping prccess. An algorithm like the one proposed by H.T. 
Kung and W.M.  Gentleran for doing matrix triangularization 
can te really difficult to understand if one dces not 
possess an adequate tcol [Ref. 7: page 22]. The algetraic 
manipulation can be difficult when dealing with systolic 
arrays because the interacticn of the processors tend te 
lead to a very complicated set of eguations in only a few 
steps and it becomes difficult to achieve any generaliza- 
tion. Scme  notaticrs have Leen proposed to help in the 
modeiing the parallel processing (Ref. 5], but no systematic 
methcd for mapping an aigoritkm onto an array exists. There 
is a lack of the appropriate tools for systolic algorithm 
development. However, with the development of graphics 
technclcgy a means now exists that can be used tc help 
people better understand how a systolic array works. If we 
conjugate this possibility with interactive programming 
techniques, it may be possible provide an user friendly tcol 
that can be used tc achieve the goal which is the main 
guideline of this investigation: to provide an easier way 
to understand systolic array operation. A case study wiil 
be presented in Chapter III that turned towards perfcrming 
Matrix Triangularization with the Givens Rotations 


Algorithm. Eventually, the systolic processing can be 
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understood and checked against real data with the use of our 
Systolic Array Graphics Simulatcr (SYSGRAS). This apprcach, 
in ccntrast with that followed by researchers at the 
N.0.S.C., in San Diego, is to take full advantage of the 
software, using computer graphics presentation to imprcve or 
to add another dimension to the man/machine interface. 
Anyway we want tc point out that this work must te seen 
as an investigation cf the possibilities of using this kind 
of facilities as tocls rather than as a final product. Only 
experimentation in practical work can show the advantages or 
the limitations of this approach. As in any other area of 
engineering, feedback from users is important in the evalua- 


tion and future imprcvement of this tool. 


II. IHE SIMULATOR APPROACH 

The Systolic Array Graphical Simulator (SYSGRAS) is 
implemented to allow interactive design of any type of 
systclic array. The cells and their interconnection links 
are defined by the user in a guided manner. The optionai 
cell tyres are built into the SYSGRAS. The user can 
construct any array using the existing cell types or by 
introducing new celi types, Іші the introduction of a печ 
cell type cannot be done interactively. This reguires a 
subrcutine to descrite and support that type, but the orpera- 
tion is simple and will be explained later. The presently 
available types can support most of the algorithms found іп 
the literature. 

SYSGRAS is designed with a requirement of being user 
friendly. It has been realized that the amount of data that 
is ncrmally required to ke entered by a user in a sessicn is 
quite large. it is important to offer the possibility of 
correcting errors, reviewing entered data, and storing data 
and resuits from the session. It can also be used later in 
another session. The sequence of operations required during 


a session is shown below: 


1) Establishment cf general conditions: 
- new or previous protlen to run. 
- need to store the calculations for later 
printing: 
- ability tc interrupt or review the pregress of 


systolic frocessing on a clocked basis. 


2) Definition of cells" attributes: 
- type of prccessor in the cell. 
- graphical snape of the cell. 


- screen cocrdinates cf the cell. 


1€ 


interfaces (importer cell, receiver of external 
data; internal cell, only exchanges data with 
other cells; exporter cell, data transmitter 


to external worid). 


3) Definition of links: 


define the origin and destination of each link. 
The origin is a port in a cell. Each cell has a 
number of output and input ports. The function 
of each cne depends on the processor tyre. 
The destination is a input port in another 
cell. A link is defined by origin cell 
output port and the destination cola aru 
port. 


4) Presentation on the graphical screen: 


the picture of the user defined array is 
initially presented on the screen with all 
processor registers initialized to zero. Since 
the presentation is strongly based on colors, 
the non existence of data interaction in the 
cells makes trem all turn into black. The 
identification number given Бу the user to 
each cell is placed at the cell's lower fright 
corner and red arrows indicate the linking 
Letween cells and the direction of data flux 
(see Fig. H.7 at Chapter IV). 


5) Data input and screen update: 


as soon as data originated from the external 
world to the importer cells is entered, the 
whole screen is updated and representing the 
Status of the systolic array at that partic- 
ular «clock cycle. The external data are entered 
at each clock cycle, and the screen is updated 


accordingly. 
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6) Review of the ccmplete systolic process: 

- optionally, the user can review the complete 
process. The tctal data generated in a 
session are stored automatically, and the 
graphics fresentaticn of any previous session 
can be seen again without any additional 


data entry. 


The central idea of the SYSGRAS is to allow a user to 
construct any desired systolic array and to use as the input 
test data such that insight can be gained about the data 
interactions of the systolic algorithm. The user has the 
capability to correlate the propagation of the input data 
with the colors of the cells to better understand the 
process. Certainly, the greater the creativity of assigning 
the cclor, the better the result showing the interactions. 
There is no restricticn as to how to do the correlaticn and 
several exarples will be presented. The colors that are 
available to a input datum are red, green and blue. Ifa 
cell receives data frcm more than one source with different 
primary colors, these colors will combine. The cell will 
show cn the screen with a resulting color that is the combi- 
nation of the input primary colors. This resulting colcr is 
not passed on to other cells. The propagation of data is 
associated with the frimary colors, and this way a gcod 
tracking of the timing that takes for each data wavefront to 
propagate through the array can be achieved. The screen 
display shows the array in a way selected by the user. The 
cells are presented with the shape, position, identification 
and ccnnections requested by the user. The colors and numer- 
ical results assumed in each clock cycle show the data 
interaction. The numerical results that appear on the 
screen is in a fixed format and with rather limited preci- 


Sion. If one wants greater precision, there is an crpticn to 


1 


record the session which can show results with up to five 
decimals on printer  cutput. These results can te checked 
against the known values to verify the correct operation of 
the mapped systolic algorithn. 


vs 


111. A CASE STUDY: MATRI 





A. A REASON FOR MATRIX TRIANGULARIZATION: QR DECOMPOSITION 


The technique of СЕ decomposition is useful to solve the 
linear least Sguares Problen. There are other possible 
approaches in the literature to solve this type of prceblen 
[Ref. 6]. We will select the QR Decomposition Method as 
mentioned Ly Gentleman and Kung in [Ref. 7: page 19]. let's 
present a brief explanation of the method. 

Given a linear system defined by 


AX = EB 


where A is a (n x p) matrix, X isa (px 1) and B is a (nx 
1) column vector. ће want tc find the vector xX such that 
jo} Y=} ]E-AX|] is minimized. Тһе vector X is alsco called 
vector of regression coefficients. This is in fact a vector 
of estimated elements and as such it strictly should be 
written $ but for ccnvenience X will continue to ke used. 
Also a mere rigorous matrix notation would be, for example, 
A instead of A. If we are able to find an orthogonal matrix 
C such that QA=K where R is an upper triangular matrix, then 
we will have 


QAX = RX = QB 


and, as a result, RX = QB. 

А5 R, Q, and P are all known, we can find X by Back 
Substitution. Gentleman has shown, aS seen in [ Ref. 8: page 
329], that this approach solves the Least Squares Preblen 
Lecause it solves the normal equations of the  prcblem 
[Ref. 6: page 148]. 


20 


An accurate ОК Decomposition can be obtained in several 
ways, like the Gram-Schmidt Process or Householder 
Transformations. For this particular study we will stick to 
the Givers Rotations Fethod. This method requires the use of 
a Sequence of plane rctations on matrices A and B, that will 


convert the linear system to a representation of the fcrm 
RX = CB 


which is straightforward to solve. 
The Givens Rotaticns Method has two different izplemen- 
tations [Ref. 8: page 331]. We will study the one called 


"with square roots". 


B. QR DECOHPOSITION EY GIVENS ROTATIONS WITH SQUARE ROCTS 


от: Q 1S an orthogonal matrix, Y=QA is called an orthog- 
chal transformation and the columns of Q, say 4(1), 9(2), 
meee are orthogonal. In order to better understand what an 
orthcgonal transformation means, we can interpret the q(i)'s 
as unit vectors (because of their unit length) that repre- 
sent the new reference frame in the original cccrdinates 
frame. If we apply an orthogonal transformation to a matrix 
(or tc a vector), we will be in fact changing the reference 
Coordinates for that vector. The information content of the 
transformed matrix (cr vector) will be the same, although 
different from previcus form because of the new reference. 
The cclumns of Q are given by the direction cosines of the 


new reference axes with respect to the old reference systen 
[Ref. 12: page 354]. 

An crthogonal transformation can be divided into a 
Sequence of eiementary transformations. In the case of 
Givens Rctations, each one ος these elementary transforma- 
tions is a rotation about one coordinate axis of the crig- 


inal reference frame (which may be n-dimensionai) . 
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In geometry we knew that the transformation 


a соѕ (х1) соѕ (х2) cos (x3) а 1 
= cos (y1) cos(y2) cos (y3) - b 1 


C cos(21) cos(z2) cos (23) Сы 


applied to a vector (al, bi, c1] will rotate the original 
reference frame about the 3 axes at the same time. The 
directicn coSines of the new x-axis are cos(x1), ccs(x2) and 
cos(x); the direction cosines of the new y-axis are 
cos (y1), cos{y2) and cos (y3), and so on. The transfcrmation 
effect achieved by tke matrix cf cosines is similar to that 
obtained by the Givens Rotation transformation matrix Q- 

πε can split tke cosines matrix into a product of 
matrices each one representing a rotation about one axis of 
the original frame. This technique of deconposing the oper- 
ation into product operations, each one being responsible to 
rotate the system matrix of an angle TETA about one axis of 
the criginal frame, 1s exactly what is done in Givens 
Algorithm. Each individual Givens' Rotation is represented 
by a matrix of the form 
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where sgr(c) + sgr(s) = 1, c = cos(TETA), s = sin(TETA) and 
dots represent zeros [Ref. 6: page 153]. This elementary 
rotation will convert the (j,i) element of the matrix which 
it premultiplies intc zero, that is, the (j,i) element of 


the product of matrix D(j,i) and A will be 
-S.a (i,i) + C.a13,1) = 0 
and sc it follows that 


d = sgrt(sgr(a(i,i)) + sgr(a(j,i))) 


a{i,i)/d and s = a(j,i)/d 


о 
II 


If the matrix å is (n x n), it will be necessary tc have 
(n-1) elementary rotations to turn all elements of tne first 
column ( with excepticn of a(1,1) ) onto zero, (n - 2) rota- 
tions for the second column, and so on. For a (3 x 3) 


matrix, three rotaticns will suffice. 


C. |. MAPPING GIVEHS RCIATIONS AIGORITHM INTO A SYSTOLIC ARRAY 


Ihe problem of tapping an algorithm into a systolic 
structure frequently turns out to be the subject of certain 
restrictions because the computed results may result in very 
small numbers impossible to be represented if fixed point 
hardware is used. In crder to avoid establishing undesirable 
conditicns on the data, one must try to select an algcrithn 
possible to be mapped and to perform computations сп any 
data within the selected range of approximations. 

We have already presented the Givens Algorithm. The 
problem now is to understand how a systolic array can 
realize it. The reader must be aware that the understanding 
of the mrathematical algorithm does not lead to immediate 
insight cn how the array works. For tne original discussion 


on this algorithm the reader is referred to [Ref. 7]. 


2:3 


Presented in the following are the specifications of the 
cell processors used in this systolic array, the general 
cell arrangement, and the input data streams. However, the 
Sight of such specification does not reveal the intricacies 
of the deduction of which function should be perfcrmed by 
each particular cell element. A possible way to gain 
insight as to how the algorithm is mapped onto the array is 
to perform an algebraic validation that is extremelly lato- 
rious. For a start, suppose that we want to triangularize a 
(3x 3) matrix A. We must premultiply it 3 times by elemen- 
tary rotation matrices which will turn the elements a(2,1), 
a(3,1) and a(3,2) intc zeros. The first transformation will 
Ге 


c s O a(1,1) 2001.2) a (173) 
-в с 0 5 а(2,1) а(2,2) а(2,3) 
0 0 1 а.) аа. 


This elementary rotation will turn element (2,1)  irtc null 
on the product matrix. We can note that the element (-S) at 
the transformation matrix occupies the position of the 
elemert that will be turned into zero in the transfcrmed 
matrix. The transformed matrix will become 


а" (1,1) а'(1,2) а' (1,3) 
0 a an (2,3) 
а(3,1) а (3,2) а (3,3) 


which will be operated by a second transformation matrix 


c 0 s a' (1,1) а* (1,2) a" (1.43) 
1 0 A 0 алдай» a e223) 
-5 0 с а(3,1) а.) a (3,3) 


and that will generate the matrix 
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ТУІ) a MZ a"(173) 
0 а" (272) а" (2,3) 
0 а!" (3,2) а" (3,3) 


Finally the last transformation 


OO Απ. a1, 2) art, 3) 
с 5 0 а (22) а'[2,3) 
0-5 С 0 ο ο ο) αν 9 3) 


will generate the upper triangular matrix 


a"(1,1) a"(1,2) a"(1,3) 
η A a 9) 
0 0 а" (3,3) 


It is important to note that the s's and c's are 
different in each transformation matrix and are calculated 
the way we have already pointed out, that is, for the first 


elementary transformation, 


d = sqrt (sqr(a(1,1)) * sqr(a(2,1))) 


С a (1, 1) /d and s = al2,1)/d 


Now the evaluaticn of the elements of each transformed 
matrix should be considered by performing the actual multi- 
plications to correlate the resulting algebra with the algo- 
rithmic processor's equations presented in Figs. 3.1 and 
3.2. For this particular algorithm it requires definitely a 
lot cf paperwork. This cannot be accepted as a good method 


if one wants to understand the general data flow in the 
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systolic array, and to check its numeric data. This is why 
че develcped the SYSGRAS to have the possibility of doing an 
evaluaticn study on how several algorithms found in the 


literature do perform. 


D. A NUMERICAL EXAMPIE 


In order to test the Givens Algorithm in doing Matrix 
Triangularization, we have prepared a numeric study case. A 


system is represented by the matrix eguation 


2 4 1 x (1) AZ 
5 7 L : x (2) = 3 
52) х (3) 8 


In Appendix C we shcw how to operate on this equaticn for 
trianguiating and to solve it by Back Substitution. Using 
these numerical data, we will set up the problem fcr the 
Simulator and be able to check out the simulation results 


against cur hand calculated values. 


Е. SETIING UP THE PRCBLEM FOR SIMULATION 


1. Sketching the Graphics 


The first step in preparing the problem for simula- 
tion consists of pianning the graphics. We have tc consider 
the fcllcwing points: 

- How do we want our systolic array arranged on the 
Screen? 

- What kind of shape will be used for the cells (in 
SYSGRAS we have two options: sguare and octogcnal)? 

- The screen coordinates for the cells (SYSGRAS divides 
the screen into a chessboard, and therefore we have a 


span of 8x8 positions to place the array cells). 
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- Which are the input and output ports? This is depen- 
dent on the subroutine that implements the cell 
processor, and so must be compatible with the defini- 
tion of the processor cell subroutine. 

- Which links are used to connect the different forts of 
the different cells? We need to identify everything so 
that the correct information at 5156345 can be 
entered. 

This may be better understood with an example, and 
we will continue the development of the systolic array to 
perform Matrix Triangularization. 

For this particular case, we will use three types of 
processors: a Givens External cell, a Givens Internal cell 
(both with square roots), and a Buffer cell. The last one is 
used to help the visualization of the input data array being 
pumped into the systclic array. 

Figures 3.1, 3.2, and 3.3 show which ports are 


considered the active terminals in the actual implementation 


begin 
C z 1.0 
end 


else begin 


s = 0.0 
| 

кетр = Vr* +2 
| 

| 





с = г Z temp 
5 =  / temp 
r = temp 
5 end 
Legend: r = residue 
Figure 3.1 Givens External Processor Cell 
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Figure 3.2 Givens Internal Processor Cell 
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in SYSGRAS for these cell processors. Fig. 3.84 show how to 
arrange the cells and their interconnections. It is inpor: 
tant to realize to the fact that although circular shaped 
cells are presented in Figs. 3.1 and 3.4, they are irple- 
mented in the SYSGRAS as octogonal shaped cells. This 
approach was adopted to improve the computational speed of 
the Simulator. However, it has been decided to keep the 
standard circular shape in Figs. 3.1 and 3.4. An addi- 
tional detail for the reader is the fact that cells 10 to 13 
in Fig. 3.4, buffer cells, do not show exactly the same way 
as Fig. 2.3 does. This is because a buffer cell is imple- 
mented in SYSGRAS with three channels, but for Matrix 
Triangularization only one channel is used (and so only one 
output fort is shown in Fig. 3.4). Attention should also 
be paid to the fact tkat the arrows shown in Fig. 3.4 repre- 
sent the links that have actually been used in our simula- 
tion. Ceils number 1, 2 and 4 of that Figure are of the same 
type cf the other  sguare shaped cells that appear in the 
same Figure, although there is no arrow drawn on their right 
side. The reason is that a third order system is used in the 
simulaticn and so there is no need to extend the limits of 
the array beyond those cells. 

This type of preparation that has been presented 
here is very helpful when entering the data because there is 
a larce amount of infcrmation asked bv SYSGRAS to te entered 
interactively. If every detail 15 planned beforehand, the 
interaction between the user and the simulator becomes 


faster and easier. 


2. Planning the Input Data Array 


Cnce the systolic array has been planned and 
sketched, the user must prepare the test data that exercise 
the simulator. This is pehaps the most important step in 


the simulation. At this point the only rule to follow is 
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Figure 3.5 Input Data Array Arrangenent 
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that a time shift as required by the algorithm has to be 
respected. The way cclor information will be assigned to the 
entering data is completely up to the user, and the greater 
his/her creativity, the better the Simulation effect, and 
consecuently, the easier to interpret the results. 

Fig. 3.5 shews a possible way to prepare the data. 
We can identify in that figure the numbers from our numeric 
example referred in last subsection. The time shift that can 
te seen is necessary to provide the necessary synchronisn 
for the systolic processing. In some algorithms, the output 
data to the external world must also be collected in a 
synchroncus fashion. But, for the particular case of Matrix 
Triangularization, as soon as the final result is achieved, 
each datum is frozen in its cell and waits for a pumping cut 
operatior whose connection links are not being considered in 


our simulation. 


IV. THE ANALYSIS OF THE SIMULATIONS 


A. DIVIDED DIFFERENCES 


Before going into detailed analysis of the Matrix 
Triangularization sore simple algorithms are presented. 

In numerical interpolation, we need to compute the 
divided differences from a set of points and then Есіп а 
polyncmial from these divided differences. The problem of 
mapping the algorithm for calculating these differences into 
a systolic array structure has been recentiy addressed in 
[Ref. 10]. In that paper, 11 and Smith propose a systolic 
architecture consisting of triangular ceils. Some upward, 
some downward, according to that paper, have the same 
internal architecture but must perform differently depending 
on their orientaticn with respect to the data flow. 
Certainly, this poses the protlem of sensing the direction 
of the data flow and implementing some additional logic in 
the cells to change their function accordingly. Li and Smith 
also discuss some irplementation problems in their paper, 
and pcint out that their actual cell "is not exactiy the 
same as described" [ Ref. 10: page 542]. 

Their algorithm is implemented in SYSGRAS, but the tasic 
cell has teen modified; grouping one upward triangle and cne 
downward triangle into the same cell. It is noticed that the 
upward cell was in fact actuating just as a traffic-orientor 
buffer and no additional information was being incorpcrated 
into the data flow. It wouldn't be cost effective to have 
that kind of processer since the introduced complexity was 
due tc the need to establish a homoyeneous type of architec- 
ture kased on triangles. Therefore, our basic cell became 


that shown in Fig. 4.1, and the systolic architecture, 
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Figure 4.1 Divided Differences Cell Processor 


almost the same as that proposed by Li and Smith, appears in 
105. HO 

We have investigated the problem of finding the divided 
differences of the set of points (1.0,1.0), (1.3.2.216 
(350, 3017 (8; 3; 3- 5 and “(S237 1) in the x-y plane. 
According to [Ref. 10: page 539], the first level divided 


differences are 


y' (1 = СС) ОО СИЕ 
- (2.2-1.0)/(1.8-1.0) - 1.5000 
у‘ (2) = {у13)-у (2) ) / (х (3) -х (2}) = 0.91666 
y' (3) = (y (4) -y (3))7 (x (4) -x (3)) 0815385 
7" (9) = (y (5)-y (4) )/ (x (5) -x (4)) 0.6000 
Ccntinuing with the calculations,the second level 
divided differences are 
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Divided Differences Array after Clock Cycle 2 
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Figure 4.4 Divided Differences Array after Clock Cycle 3 
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Divided Differences Array after Clock Cycle 4 


Figure 4.5 
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and the third level divided differences are 


-0.00408 
0.14260 


and the fourth level is 0.03411. This suggests a kind of 
pyramid structure that can be seen from Figs. (um up to 
4.5. The calculaticn of the differences is performed in 
four-clock cycles, cne represented in each figure. The 
bottom row displaying the first order difíerences, the upper 
row, the second order and so on till the top cell. We have 
related each point (F1-P5) to a primary color, that is, P1 
is blue, P2 is red, P= is green, PÅ is blue, and P5 is red. 
In Fig. 4.2 we can see the result of the first ster of 
calculations, with the mixing of these colors at the botton 
cells. Going through each step and comparing them with the 
Mathematical formulation, we get a better understanding of 
the data interaction. The the final results in Fig. 4.5 


Should be compared with the above calculated values. 


B. MATRIX-MATRIX HUITIPLICATICN 


This is another simulation problem that have teen tried 
on SYSGRAS. The presentation of the algorithm has teen dcne 
at [ Ref. 1: page 97. Presented in Fig. 4.7 is the arrange- 
ment cf the systolic array and how the data is pumped into 
it. The elements of matrix A can be seen flowing tcwards the 
lower right and those cf matrix B flowing towards the lower 
left. The resultant matrix C is pumped upwards. пе have 


performed a Simulaticn to compute the matrix product AB=C as 


Ее] оч 
VS 2 1 9 8 15 17 
4 7 1 $ 3 = 29 67 70 
ЕЕ сз 10 22 5272 798 


3.9 


The scheme used for systolic computation has an advan- 
tage due to the fact that a great number of matrices that 
are used in typical problems are band matrices [Ref. 9]. 
Consequently, the systolic array does not need to inciude as 
many cells as it weuld have to for the case of full 
matrices. In this numeric example, in order to restrict cur 
array to a small size so that it can be seen on the screen 
(as in Fig. 4.8), we decided to set elements a(1,3) and 
b(3,1) tc zero. This way, a systolic array of small dimen- 
Sions can multiply larger matrices with restricted nonzero 
Бапа. 

A single type of cell is used in this algorithm. It is 
calied Inner Product Step Processor. Its geometry and algo- 
rithmic definition are shown in Fig. 4.6 . This same type of 
cell will be presented later in another algorithm 
implementation. 

Tn this simulaticn problem, difíerent colors are attrib- 
uted to different rows of matrix A and to different cclumns 
of matrix B. As we know, each element of the product matrix 
C will re generated Ey a combination of one row of A and one 
column of B. When these elements join each other in a 
systolic cell, we will be able to identify exactly that 


combination taking place at each cell in the space-time 


frame. We present here the coding of colors that was 
adopted: 
blue blue blue blue red green 
red red red š plue red green 
green green green blue red green 
blue magenta cyan 
= magenta red yellow 
cyan yellow green 
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(2) 
э ~ out (3) = r ; processed output 
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‘out г = inp (it) * inp(2) + inp (Ss) 
| 


анна 





Figure 4.6 Inner Product Step Processor Cell 


TO show how colors can help in understanding the mecha- 
nism cf rultiplicaticn in the systolic array, we will track 
the generation of the element c(3,2). We see from the above 
color coding that с (3,2) is a yellow element, Since it 
results from the ccrbination of a green row anda гей 


column. From mathematics, 
ЖЕРІ ad (oy, 1). (1,2) + a(3,2).b(2,2) + a(3,3).b1(3,2) 


Figure 4.7 shows a schematic diagram that corresponds to 
clock cycle number 1, the same cycle is also shown in 
Fig. 4.8, in which elements a(1,1) and b(1,1) have entered 
cells nurbers 14 and 15 respectively. ОШ id it can 
also Ге seen that the element a (3,1) (green element) will 
enter into the array (at cell 06) at clock cycle 3 (see 
Fig. 4.7), while element b(1,2) (red element) will enter at 
clock cycle 2 (at cell 12) (see Fig. 4.9). After that, they 
will run through the array and will meet each other at cell 
02 at clcck cycle 5 (see Figure 4.12). This meeting gener- 
ates the first partial result a(3,1).b(1,2). This can 
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Systolic Array for Matrix-Matrix Multiplication 


Figure 4.7 
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Figure 4.8 Matrix-Matrix Multiplication after Clock Cycle 1 
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Figure 4.9 Matrix-Matrix Multiplication after Clock Cycle 2 
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Figure 4.10 Matrix-Matrix Multiplication after Clock Cycle 3 
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Figure 4.11 Matrix-Matrix Multiplication after Clock Cycle y 
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Figure 4.12 Matrix-Matrix Multiplication after Clock Cycle 5 
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Figure 4.13 Matrix-Matrix Multiplication after Clock Cycle 6 
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Figure 4.14 Matrix-Matrix Multiplication after Clock Cycie 7 
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Figure 4.15 Matrix-Matrix Multiplication after Clock Cycle 8 


50 





0 





- | = ist "e А 
` => på ook к. κ 22%, 


ASS AA mum FP a a Pdl 





AD. e UN APO Y I > cay (AEE a- 


Figure 4.16 Matrix-Matrix Multiplication after Clock Cycle 9 
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te verified to be 8x1=8 by our numeric example and by the 
contents of cell 02 at clock cycle 5. This partial result 
with yellow color (identified as c(3,2) for convenience) is 
punped upwards to cell 09. Performing similar verification, 
it can be seen that elements a(3,2), b(2,2) and that partial 
result of c(3,2) will meet at cell 09 at clock cycle 6 (see 
Fig. 4.13). They will, then, generate a second partial 
result 
a(3,1) 211,210.92 (3, 2) ὉΠ 


which is 8x1+#2x7=22 as seen in Fig. 4.13 at cell 09. By 
similar reasoning, the final result 8x1+2x7+3x10=52 is 
generated in cell 14 at clock cycle 7 (see Fig. 4.14). 

It can be seen that matrix C is symmetrical with respect 
to the coicr coding.  Symmetrical elements like C(3,2) and 
C(2,3) are computed in parallel and pumped up side ty side. 
So, they are easily tracked with tne help of colors, and the 


whole operation can be better visualized. 


C. MATRIX TRIANGULARIZATION BY GIVENS ROTATIONS 


We will provide a brief review of this problen. The 
nomenclature to be used is the same as that in Chapter III. 
The reader 15 also referred to Appendix C, "A Numerical 
Example fer Matrix Triangularization", which uses the sane 
data as that described here. 


The linear system that needs to be solved is 
AX = B 


where A is a nxp matrix, X is a column vector of p elements, 
and E, a coiunn vectcr of n elements. 

"€ will premultiply the matrix A by a transformation 
matrix С such that the product matrix becomes an upper 
triangular matrix R. To keep the matrix eguality it is 


necessary to premultiply vector B by matrix Q. This will 


result in the product vector QB. This operation is shown 


below: 
ОАХ = QB 

and as R = QA, it beccmes 
RX = QB 


The Givens Rotations in this algorithm under simulation 
triangularize the matrix A and computes the vector QB 


concurrently. This is possible because 
Q(A1B) = (0A)] (0B) = RI (QB) 


where the operator "|" performs matrix concatenation. For 


Clarity, if we define 


ЕА 12 
А = 5 4 and B = 3 
ЗО 
then A[B becomes 
ΙΙΙ; 
A|B = 5 7 4 3 
3 0 1 8 


As we have shown in Chapter III, the simulation uses the 
cell structure seen in Fig. 3.4 and the test data seen in 
Figure 3.5 (the same as that used in the numerical example 
Cf Appendix C). The picture that appears on the screen at 
the start of the simulation is shown in Fig. 4.17 which the 
reader should compare with Fig. 3.4 in Chapter ттт. 
Fig. 4.17 shows the identification of each cell at its lower 
right corner. Cells numbered 13 to 10 represent the data 
wavefront to be pumped into the systolic array. They are not 
СЕ the array, but only buffer cells to allow presenta- 


tion cf the data to te pumped into the array on the screen. 


Un 
(2) 


The final elements of vector QB are calculated in cells 0, 
02, and 01. The final elements of matrix A are calculated in 
cells numbered 09, 08, 06, 07, 05 and 03. | 

As shown in Appendix C, the values for AX - B areas 


follows and A]B is as shown above. 


2 4 1 x (1) 12 
5 7 4 ° X72) = 
3 0 1 x (3) 8 


The test data are pumped into the array as wavefronts 
that can be seen in cells 13 to 10 just before being entered 
(the reader can refer to Fig. 3.5 to the several wavefronts, 
each one corresponding to a different color). There is a 
time shift among the elements of the same row for the matrix 
under  triangularization {compound matrix А |В). This 
displacement is needed to establish the correct timing for 
the systclic operaticn. In this simulation, we decided to 
assign a different cclor for each row of A{B (as shown in 
Fig. 3.5) and to keep track of the flow of the colored 
elements through the array. This flow can help us tc under- 
stand, in a sequence of space-time frames, how the whcle 
array benaves. The effect of the different clock for subseq- 
uent cycles can be seen from Fig. 4.18 to 4.26 . In 
Fig. 4.18, element a(1,1) of matrix A enters into cell 09. 
We can see, in cell 13 element a(2,1), and in cell 12 
element a(1,2). They are ready to be clocked in at the next 
clock cycle. This hafpens as seen in Fig. 4.19, when cells 
09 and C& display the stored result of the computation 
performed. Notice that cells 095, 07, and 03 actuate like a 
reflector for the data wavefront that is going downwards. 
Tney rotate the data flow direction by 90 degrees counter- 
clockwise. As a result of the interaction between the wave- 
fronts that go towards the right and that going downwards, 


there is a resulting wavefront that flows towards the lower 
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right. This is the important observation because it shows 
how the effect of the input data spreads over the array 
(this is why we decided to associate this wavefront with 
identical colors). The wavefront flowing towards the right 
carries information about the angle that the row elements 
must te rotated. This information must arrive at each cell 
Just in phase with information of the matrix element teing 
rotated, which is being transferred to each cell by the 
downward wavefront. At each clock cycle a new rotaticn angle 
is calculated at the inclined boundary cells (Givens 
External Cells, namely 09, 07 and 03) and transmitted tc the 
rightest cells at tke same row (that is from cell 09 to 
cells (8, 06 and 04 in the upper row, from cell 07 to cells 
05 and 02 in the seccnd row and £rom cell 03 to cell 01 in 
the lower row). Partial results are generated at each 
systolic array element at each clock cycle. At the mcment 
when the downward wavefront start feeding zeros into a 
systolic array element, it freezes its cortent and do not 
modify it any more. We have selected the black color to 
indicate that the input data are bringing no informaticn. 
The Б1аск wavefront flowing through the systolic array acts 
аз а freezing wavefrcnt. Figs. 4.19 up to 4.26 show the 
effect of the remaining clock cycles. In Fig. 4.26 we Lave 
the final result of the triangularization frozen  intc the 
cells. The upper triangular matrix R and the vector QB have 
keen computed. It can be checked out against those values 
shown in Appendix "A Numerical Example for Matrix 
Triangularization". -The computed results are ready to be 
used in the Back Substitution to solve the system equations. 
In order to transfer the data from the cells to the hardware 
that performs the Back Substitution operation, a special 
connecticn which is nct shown here becomes necessary. But, 
it is not required here for the understanding of the Givens 


Rotaticns process. 
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Figure 4.17 Matrix Triangularization Array 
Initialization 
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Figure 4.18 Matrix Triangularization Array 
after Clock Cycle 
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Figure 4.19 Matrix Triangularization Array 
after Clock Cycle 2 
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Figure 4.20 Matrix Triangularization Array 
after Clock Cycle 3 
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Figure 4.21 


Matrix Triangularization Array 
after Clock Cycle 4 
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Figure 4.22 Matrix Triangularization Array 
after Clock Cycle 





p————————————— — —.— JL: uA a m “as “m < m 


61 





ааа A rr TE AER и имени ъа IT ACA, ca ST uU uU. UD MD t ST VE cc A AO rs CI e cs ear ect o) 


A O A HR O A A To. A RR A αμ TN eee ee eee ee eee 


Figure 4.23 Matrix Triangularization Array 
after Clock Cycle 6 
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Figure 4.24 Matrix Triangularization Array 
after Clock Cycle 
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Figure 4.25 Matrix Triangularization Array 
after Clock Cycle 8 
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Figure 4.26 


Matrix Triangularization Array 
after Clock Cycle 9 
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We will now present a different kind of perspective to 
the reader on how tke algorithm performs. The geometrical 
interpretation of Givens Rotations is extremely helpful to 
provide ketter insight. A matrix can be considered as repre- 
senting a set of vectcers (aS many as the number of columns) 
in a n-dimensional space (n equals the number of rows). ТЕ 
the matrix has only three rows, their columns are vectcrs of 
a three dimensional srace, and the elements of the first row 
are the components cver the x-axis of the column vectors. 
The rctaticn operaticn does not rotate the vectors. The 
reference frame (cocrdinate axes) is the one that is 
rotated, and, as a consequence, the descripticn of the 
vectors in terms of their components with respect to the new 
reference becomes different. When a matrix is rotated to 
become an upper triangular, the first column is transfcrned 
in such a way that orly element (1,1) will be nonzero. This 
means that the first column vector is positioned over the 
x-axis ir the new reference frame. As the vector is the sane 
as before, the numerical value cf the new element (1,1!) must 
be equal to the length of the vector. Element (1,2) of the 
rotated matrix, for exanple, is the x-component of the 
second column vector in the new reference frame. Since the 
relative position of both vectors is unaltered, the new 
value of that element must be egual to the projection cf the 
second cclumn vector over the first column vector, since 
this last one is now along the new x-axis. The geometrical 
interrretation can be extended to all elements of the array. 

We will track tke build up f the numerical values of 
cells 09 (that stores element  (1,1)) and 08 (that stores 
element (1,2)). This allows us to study the operation of 
both tyres of cells used in this algorithm (cell 09 is an 
outer cell and cell 08 is an inner cell). It will also 
provide a comparison between the geometric interpretation 


presented above and the algebraic values shown in graphics. 
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Figure 4.27 Givens External Processor Cell 








We will start with cell 09. From Fig. 4.27 we see that 
this cell receives the elements of the "nonrotated" first 
column vector, computes the rotation angle (calculating its 
outputs cosine c and its sine 5), and stores the "rotated" 
x-component r of the first vector. The other components are 
obvicusly zeros and ccrrespond to the other elements of the 
first cclumn (cells cf the lcwer array that are not shown 
because their ccntents are alwayS zeros). The rotation 
angle information is rassed as output to cell 08 to be used 
to "rotate" the second vector. Table I shows how the raram- 
eters r, C, and s of this ceil respond to the external 
inputs. These values can be verified witn the help of the 
cell algcrithm presented in Fig. 4.27 . The descripticn cf 
the cperation сап te followed through the sequence of 
megs. 4.28 to 4.30. 
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TABLE I 
Time Description of Outer Cell Operation 


| 
| 
Cycle Input 7 Output c Output s Residue r 
0 0.0 (black) 1.000 0.000 0.000 
1 2. blue) 0.000 1.000 2.000 
2 5.0 (red) 08871 2 228 5.398 | 
© ЕЕ гееп 0.874 0.487 6.164 | 
4 ο er 1.000 0.000 6.164 





Lo 





At clock cycle 0 no input has been received and so there 
is ncthing to rotate. The rotation angle is zero (c=1.0 and 
s=0.0). At cycle 1, the first element is received. The first 
element is the old x-component of the first column vector. 
As it is over the x-axis, the reference does not need to be 
rotated now. However, when this occurs, because of the way 
the algorithm is implemented (we will see the reason when 
anaiysing cell 08), an angle 90 degrees is informed tc cell 
08 (c=0.0 and s=1.0) although the value r stored in cell 09 
1c» UE equal to the input. At clock cycle 2, the 
yY-conponent of the old first column vector is entered. Now a 
rotation is necessary to keep the x-axis of the reference 
frame over the first cclumn vector. The rotation angle is 
computed (c=0.371 and s=0.928) and the reference frame is 
rotated (about the z-axis). The new x-component (stored in 
I) is the square root of the summation of the squares of the 
old x and y-components, that is 5.385 (corresponding tc the 
projecticn of the first column vector over the x-y plane). 
At cycle 3, the z-component is entered. As the last rotation 
resulted in a vector over the x-axis, the combinaticn of 
this with the just entered z-component will result in а 
vector in the x-z plane deviated from the x-axis because of 


the newly arrived z-ccmponent. Again a rotation is necessary 


6 8 





Figure 4.28 Reference Frame before Rotations 


to keep the x-axis over the first column vector. The new 
x-component is 6.164. At clock cycle 4a black datum is 
received (a zero value) and that freezes the contents of 
cell 09 with its final value. No more rotations are required 
and the output of the cell informs rotation angle zero 
(c=1.0 and s=0.0). 

Ncw we will follow the build up of the contents of cell 
08 (the right side neighbor cell of cell 09). Cell 08 stores 
element (1,2), the x-component of the second column vector. 
As done before, we present takle II with inputs and values 
Stored in this cell at each clock cycle. In this case we 
will disregard the numerical output of this cell because we 
will ccncentrate on the build up of the residue r in this 
cell. From Fig. 4.31, this cell receives as input the rota- 
tion angle (inputs c and S), by which the reference frame 
has changed, to be used to compute the new x-component of 
the second column vector (tc be stored аж г). The ctker 


input refers to the elements of the "nonrotated" second 


c 
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Figure 4.29 Reference Frame after Pirst Rotation 





Figure 4.30 Reference Frame after Second Rotation 
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Figure 4.31 Glvens Internal Processor Cell 


column vector (input 2). The outputs of this cell are the 
rotation angle (c and s are passed as output to the right 
neighbor cell 05) and rotation information to the neightor 
cell immediately belcw cell 08 to "rotate" the other compo- 
nents of the column vector as a result of the reference 


frame rotation. 


The operation of this cell is more difficult tc visu- 
alize. Figure 4.32 will be used for the explanation. It only 
shows the x-y plane. Suppose the second column vectcr is A, 
as seen in that Figure. The cld reference frame is х1-у1. 
The ccmpcnents of A with respect to that frame are ax and 
ау. In our numerical example, ax=4.0 is the first input 
z{i) to the cell. Tre second input z(i) to the cell, on the 
following cycle, is ay=7.0. Suppose the reference frame is 
rotated about the z-axis to position x2-y2 to keep the 


X-axis along the first column vector, which is shown as B. 
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TABLE ТЇ 
Time Description of Inner Cell Operation 


seca II dium cia I cc ii cg cg A | 





Cycle Input c Input s Input σ Residue г 
0 1.000 Pac 0.900 (black 0.0 тас) 0.000 
1 1.000 (black 0.000 (black O70 (black 0.000 
2 C. 000 (blue) 1.000 (blue) 4.0 (blue) 4.000 
3 0.371 (red) 0.928 (red) 7.0 (red) 7. 985 
| 4 0.874 5351. 0.487 Е 0.0 ас, 6. 976 
| 3 1.000 (black 0.000 (black 0.0 (black 6.976 
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Figure 4.32 Gecmetric Interpretation of a Rotation 


Instead of evaluating the compcnents of A in this new frame, 
let us study the effect over its components ax and ay. In 
this new frame, ax will be decomposed into components axx 
(over the new x-axis) and axy (over the new y-axis). 


Similarly, ay will ke decomposed into components ayx (over 


the new x-axis) and ayy (over the new y-axis). The component 
of A over the new x-axis is the summation of ахх and ayx. 
This will ke the new element (1,2). AS seen in Fig. 4.32, 


ахх = ах . cos(alfa) and ayx = ау. sin (alfa) 


and as element (1,2) is 


τ ахх + аух 


it results 


т ax . ccs(alfa) + ay. sin(alfa) 


As ax was the previous r and ay is the newly arrived ingut 


2(1), in computer algorithmic language we have 
r= c(i) * xr + s(1) * 2 (1) 


EXUn rig. 4.31. Substituting the values of clock cycie 2 


into this equation, we find 


ООР О0О + 1.0 * 4.0 


r 


r 4.0 


and this can be checked against the above table. At clock 


cycle 3, the values give us 


ОКИ 0.923 х/7.0 


Г 


r 7985 


that alsc checks against the atove table. Doing the same Гог 
clock cycle 4, we get r=6.976. This gives the final value of 
the x-ccmronent of the second column vector. The rctaticn, 
however, will affect all other components. Let us describe 


how this occurs. The component of A over the new y-axis is 


z (0) ауу - аху 


2 (о) ау - coS{alfa) - ax . sin (alfa) 


T3 


and in algorithmic language, 
2 (0) = Со 5/1) τε 


This infcrmation 2(с) is passed to the cell immediately 
telow tc generate the new y-component of the second column 
vector. 

A further point to be noticed is that if the rotation 
angle input to an inner cell is zero (c=1.0 and s=0.0), the 
cell will not change the stored value at r. This is why the 
rotation angle is informed as 90 degrees, as mentioned 
previously, when the first element is received at cell 09. 
This triggers cell (8 to receive and store the z input at 
the follcwing clock cycle. At the end of clock cycle 02, the 
z=4.0 input is stored. No rotation is performed, 5ο r=4.0. 
The follcwinj clock cycles impose the modificaticn cf the 
contents of this cell because of the changes in the refer- 
ence frame. As soon as the input z freezes (at clock cycle 
5), the residue r cf the cell also freezes. It can be 
cbserved, if the outputs of cell 09 and its inputs tc cell 
C8 are ccmpared, the transmmission delay of one clock cycle 
from a cell іс anotter. it can also be noticed that the 
inputs tc cell 08 always have the same color. This simula- 
tion was purposely designed this way to show the need for 
sinchronization between the wavefronts that propagate 
throuch the array. 

Certainly we cannct present all this complexity with the 
simulator. The algorithm of Givens Rotations is the most 
complex that we have traced in our survey. However, the 
Simulator has been a fundamental tool to perform a numerical 


study and to verify the correctness of the algorithn. 
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D. BACK SUBSTITUTION 


We will use the results of the previous section as input 
data for the simulaticn descrited in this section. This way, 
with these two secticns, we can go through the complete 
problem described in the Appendix "A Numerical Example for 
Matrix Triangularization". That is, to search for the vector 
X in matrix equation AX=B. The process of Back Substitution 
is described in (Ref. 7: page 2%). 

One of the cells used in this algorithm is shown in 
Figure 4.33 . The above reference presents the cell shown 
ШІ rig. 4.33 aS a circular Shaped cell named Back 
Substitution Cell. Since SYSGRAS would take too much tine to 
draw circular shapes, an octogonal shape is used instead 
(see cell number 05 in Figure 4.37). 

The other type cf cell used in this algorithm is the 
same Inner Product Step Processor that was presented in 
ВЦ. 6 tc do Matrix-Matrix Multiplication. This is an 
interesting aspect of systolic arrays: many algorithms that 
appear in the ¿literature are implemented with the same tyres 
of cells. However, the geometry of the array, in this algo- 
rithm, requires the cell to be square shaped. Although the 
cell is functionally the same as shown in Fig. 4.6, we 
present it again in Figure 4.34 to match the way Fig. 4.35 
represents it as part of the structure (see cells 04 and 
02) . | 

The mathematical background for solving a triangular 
linear system involving a lower triangular array has been 
presented by Kung in (Ref. 1: page 19}. We will adopt here 
his expianation. 


Suppose the system equation to be solved is 
RX - D 


where R = (r(i,j)) is a nonsingular nxn lower triangular 
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if а <> О then 
r = (b-y) / a 
x else vr = 0,0 


TA x =r; processed output 


Figure 4.33 Back Sukstitution Main Cell 
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Figure 4.34 Inner Product Step Processor Cell 


Matrix and Dis aa n-vector, both being given. To corpute 


the vectcr 2, we can use Forward Substitution as follows: 
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Renca 
ΟΙ ͵τ, η ΤΠ, η. τῇ 
Ко = k + 1 
until k i 


НІ Па - ү(і,1)) / r (131) 


The above algorithm can be used to calculate the 
elements cf vector X in the sequence x(1), x(2), ... and so 
on. This algorithm can be implemented in a systolic array as 
will be shown. У = (y(i,k)) is a vector of partial results 
that allcws the recurrence to build up. 

In our case, since the interest is in Back Substitution, 
we will simply enter the data into the systolic array ir an 
crder reverse to that proposed by Kung in his paper. Let us 
make this point more clear. We до Forward Substitution when 
we have a lower triangular array. In this case we solve 
Amst fcr x(1), next Еог х (2) and so on. Following the 
sequence proposed by Kung, vector X should be pumped into 
the systolic array as x(1), x(2) and x(3). The same for 
vector CB. Since here we are doing Back Substitution, we 
enter the vector X in a reverse sequence х(3), х(2), and 
x(1). ОВ 15 also entered the same way. The way matrix R 
teing pumped into the systclic array is modified with 
respected to that prcposed by Kung. For instance, the first 
of its elements to Ee pumped into the systolic array is 
EN, 3) (see cell 10 in Fig. 4.36) which is required to 
compute x(3). In the Forward Substitution Method, the first 
element pumped should be r(1,1) to compute х(1). The cther 
elements of R are rearranged accordinglj. 


The system equaticn to be solved in this simulation is 
RX = ОВ 


where К 15 ап upper triangular matrix and QB is a n-vector 
that resulted from the transformation seen in the previous 


Secticn. 
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Figure 4.35 presents the arrangement of the data with 
respect to the systolic array. The data elements are drawn 
in such a way that the required data synchronization to: 
perform the operation is evident. Vector QB and matrix R 
actually carry data into the systolic array for processing. 
Vector Y enters the array bringing no data. Its elements are 
all zeres at the mcrent represented in Fig. 4.35 . Its 
values are modified as it flows through the array. Vector X 
is generated into cell 05, the Back Substitution Cell, and 
its elements are pumped through the array in a direction 
cpposite to that of vector Y. As they go through celis 04 
and 02, they combine with elements from matrix R to build up 
the elements of vectcr Y. Finally, X wiil come out with its 
final value when it leaves the array, from cell number 02 of 
Ш0- 4.35. 

Figures 4.35 and 4.36 should be compared. This last one 
presents tne same arrangement as the former, but it shows 
how the picture appears on the screen. Matrix R is on the 
upper blcck of tne cells (cells number 07 to 21) (Refer to 
Appendix C to compare the numerical values). This Elock dces 
not belong to the array itself and is used only fcr presen- 
tation of tne data. The systolic array is represented by 
cells C5, 04, and 02. Tne elements of vector QB are intro- 
duced from its left side (at ceil 06). The vector of 
partial results Y, a string of zeros separated by one clcck 
cycle delay, is pumped in frem the right side (at cell 03). 
The output that will collect the solved elements of vector X 
Met epresented by cell 01. Attention to the fact that the 
numbers that are displayed at cells 04 and 02 are the values 
assumed ty the elements of vector Y as they flow leftwards 
tarough the array. Ihe number displayed in cell 05 is the 
element cf vector X being computed and that in cell 01 15 
the element of X being pumped out to the external werld. 


Another point to nctice is the way the bidirectional 
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connections between cells 05 and 04, and 04 and 02 is imple- 
mented by SYSGRAS. Only one bidirectional arrow is used. 

We decided to associate the elements of each cclunn of 
the matrix R (third cclumn in blue, second in red, and first 
in green) and the corresponding row elements of vectors X 
and QB with the same color. (e.g. the element number 3 of 
the QE will interact only with the column number 3 of the 
matrix k). Cell prccessor number 5, the so called Back 
Substitution Celi is shaped differently, as mentioned 
earlier, to emphasize its special purpose in the systolic 
array. This cell processor computes the elements of vector X 
from the received data and pumps them out to cell 04 to be 
utilized for calculation of the previously referred partial 
results. When the X elements complete their trip through 
the array, they are pumped out at cell 01. 

The manner in which color is used to provide information 
is now described. The elements of vectors ОВ, Х апа Y, as 
well as matrix R, have primary colors (red, blue and green). 
When they meet elements of different primary colors, the 
cell where this meeting takes place assumes a secondary 
color that is the result of that combination. The elements 
of vector Y, for example, can be seen with their original 
color in cell 03, before mixing up with others. During their 
trip thrcugh the array they keep that color, but as a result 
of their ccmbination with different color elements, the cell 
where they might be may present different secondary colors. 
However, we should keep in mind that only primary colors 
travel from a cell to another. Secondary colors are static. 

Ic make these  pcints more clear, we will track the 
formaticn of elements х{3) and x(2). Since R is ап upper 
triangular matrix, we have 

x (3) = 9512) ЕЁ 
Substituting numerical values, as shown in Appendix C, 


х (3) = -10.593 / 058142 1020 


80 





Figure 4.36 Back Substitution Array Initialization 
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As it has been pointed out, the Back Substitution Cell, cell 
number 05 in Fig. 4.37, is responsible for computing the 
elements of X. That Figure presents the computaticn of 
element x(3). Element gb(3) was pumped in from cell 06 (see 
Figen 24.3 6} and element cr (3,3) from cell 10 (see same 
Figure). It can also be seen in Fig. 4.37 that x(3) is blue 
Since it was formed Гу the comEimation οὗ αρ) and το Ии 
Loth tlue (remember that the third column of В has been 
coded blue). After its computation, element x(3) is pumped 
through the array to cell 04 (clock cycle 2, Fig. 4.38), to 
cell 02 (clock cycle 3, Fig. 4.39) and finally to the 
external world (clock cycle 4, Bigs 4.40) at cell ої Зри 
this trip, its value is used for computation of the elements 
of vector Y which are necessary for computation of the other 
elements of vector xXx. Now let us track the formaticn of 
πο. 


From algebra, ue have 
x (2) = (90 (IRA, = IZ, 2) 


The partial result x(3).r (2,3) is computed at clock cycle 2, 
in ceil 04. This cell receives x(3) from cell 05 (computed 
at clcck cycle 1, see Fig. 4.37) and r(2,3) from cell 08 
(see Fig. 4.37). This partial result is called y (2) (color 
coded red because it will contribute to the formaticn of 
x(2), which color code is red). This time, y(2), being orig- 
inally red, appears in cell 04 as magenta because the 
contribution of y(2) is activated blue when it interacts 
with Flue r(2,3) and Elue x(3). At this point of the compu- 
tation all necessary data to compute х(2) is available. 
Element y(2) is stored in cell 04, element qb(2) is ready to 
enter the array (see cell 06, in Fig. 4.38), and so is 
element r©(2,2) (see cell 10 of Fig. 4.38). At clock cy ame 
these elements аге pumped intc the Back Substitution Cell 
(cell 05) (see Fig. 4.39) and the computation of x(2) takes 
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place. Substituting numerical values (as seen in those 


cells) into the above equation it becomes 
х (2) = (70.566 + 11.538) / 4.042 = 2.714 


Element х (2) assumes color red because all factors that 
contributed to its fcrmation had that color code. After its 
computation, х(2) is pumped through the array via cell 04 
(clock cycle 4, εν ЧО ο ὃς (clock cycle °, 
Fig. 4.41) and finally pumped out to cell 01. During this 
trip, similarly for x(3), it contributes to the formation of 
y(1), ancther element of the vector of partial results. This 
will ke required to the computation of x(1). 

The sequence of pictures from Fig. 84.36 to 4.484 displays 
the whole computaticnal process in the svstolic array 


according to the algorithm. 


E. FURTHER EXPLORATICNS 


The potential of SYSGRAS goes far beyond the irplenenta- 
tion of systolic algcritnms. Presently, because of fratical 
importarce, extensive research is being carried out on the 
investigation of faults in tbe actual systolic devices 
[Ref. 11]. How to circumvent those faults and which effect 
they do have on the results are some of the subjects under 
study.  SYSGRAS can ke used as a valuable tool in perfcrming 
such studies, because of the simplicity of its interface. 
with the user and because of the possibility of inclusicn of 
possikle subroutines to support the algorithms. To emphasize 
its versatility, we address the fact that a processor cell 
may have a large number of input/output ports {although it 
is presently set up for a maximum of four). This allows a 
large number of connections with other celis or the cutside. 
This characteristic car be used, as example, to inject data 


into a cell that is situated in any position in the array. 
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Figure 4.37 Back Substitution Array after Clock Cycle 1 
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Figure 4.38 Back Substitution Array after Clock Cycle 2 
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Figure 4.39 





Back Substitution Array after Clock Cycle 3 
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Figure 4.40 
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Back Substituticn Array after Clock Cycle 4 
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Figure 4.41 Back Substitution Array after Clock Cycle 5 
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Figure 4.42 Back Substitution Array after Clock Cycle 6 
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Figure 4.33 Back Substitution Array after Clock Cycle 7 
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Figure 4.44 Back Substitution Array after Clock Cycle 8 
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These data could ccnsist of logical commands that would 
alter the function cf the processor for the selected cell. 
These ccmmands, іп a particular application, could “Каа 
the cell, or degrade its performance according to a desired 
need. Ancther apprcach, that would reduce the data entry 
volune, would be to design a subroutine to implement a 
defective processor and assign that processor to the target 
cell. The color feature can also be used to analyse the 
effect that faults at a particular cell will have over the 
cthers. її а color is injected at selected input ports 
(depending on the software of the subroutine that supports a 
processor), it will spread over the cells that receive data 
from this faulty cell under evaluation, and will display the 
"bad sectcr" on the screen. 

Many other possible studies related to systolic arrays 
may re considered due to the flexibility of the SYSGRAS 


software. 


V. THE INTERACTION WITH THE SIMULATOR 


А. SCFTWARE REQUIREMENTS AND PROCEDURES 


In order to operate with SYSGRAS, the user must have the 


following files. 


pero ltOY.PAS 
ZY FCR. FOR 
LOCIN.COM 
JUNK.CCH 


The user, Ее сошра ша оо ТО ПРАБ Тап SISTOR.FOR, 
must link both relocatable codes with SIGGRAPH core, which 
is at CS3202.CORZ library. То do this, the command to be 


used is 
EMI TISIOY,SYSROR,(CS3202.CORE) CORE/LIB 


In order to run, enter the command "RUN SYSTOY". Ihe code 
will address the RAMTEK peripheral system when running, and 


this device should be turned on and ready to operate. 


В. ENTERING A NEW PECBLEM 


hen the command "RUN SYSTCY" is entered, the simulator 
starts prorpting all questions that must be answered by the 
user to enter the prcklem. In Appendix D we have recorded a 
session of the Simulatcr to make the interaction more under- 
standable. It was cur orijinal intention to present this 
recording of the session aS a guide to  SYSGRAS for the 
Matrix Triangularization problem.However, due to the large 
amount of material that needs to be presented, we decided to 


include a short "dummy" session in Appendix D for clarity. 
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It shows hcw to enter the data and how to handle an incor- 
rect input. The simulator permits recovery from entry errors 
and perrits display of input data as many times as desired. 
The user has the fcllowing choices when dealing with the 


Simulator: 


- select a new prcblen run or a previous problem session 

- record the session ina separate file for later 
printing 

- ability to interrupt the graphics presentation at the 


end of each clcck cycle 


It is recommended that the user follow exactly the same 
nomenclature that was mentioned in Chapter III Section E to 


avoid errors that fight be difficult to troubleshoot. 


С. REVIEW OF A PREVIOUS SESSION 


We have included in Appendix E a recording of a session 
with SYSGRAS for the purpose of reviewing a previous 
sessicn. The user must take care to rename the file which 
contains the data frcm previous session into a file under 
the name SYSARRAY.MEM . This file must be the most recent 
version generated by the VAX VMS Operating System. 

Ihe following files with data from previous sessions are 
already available, and may be run upon request frcr the 


user 


- file DIVDIFF.“MEY, with recording of simulation anal- 


ysed in Chapter IV, Section A. 


- file MULTPLY.MEM, with recording of simulaticn anal- 


ysed in Chapter IV, Section B. 


- file GIVENSW.MEM, with recording ος simulation anal- 
ysed in Chapter IV, Section C. 
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- file BAKSUBS.MEM, with recording of Simulation anal- 
ysed in Chapter IV, Section D. 


D. RECORDING OF COMPUTED DATA AND PRINTING OUT 


If the user has asked for this facility,  SYSGRAS will 
write all computed results of each cell at each clock cycle 
into a file named SYSPRINT.DOC, which can be printed out 
after the end of the session. 
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VI. THE SIMULATOR MAINTENANCE 


A. STRUCTURE OF THE SIMULATOR 


The SYSGRAS has keen designed in two layers. Тһе upper 
layer that interacts with the user is in PASCAL, and so it 
is quite portable. In асы} due to the use of the 
"OTHERWISE" feature offered by the "CASE" command in the VAX 
PASCAI CCMPILER, a few of its case structures have to be 
slightly modified before it can be installed in ancther 
machine. The lower layer is constructed in FORTRAN 77 and 
it is ccmpiled with VAX FORTRAN. This layer interfaces with 
tne Graphics Package SIGGRAPH that is presently available at 
NPS on the VAX 750 machine. It has been set up for presenta- 
tion cn the RAMTEK 9400 systen, and so it is not portable. 
Its structure, however, can be modified to interface with 
another graphics package with characteristics similar to 
those of SIGGRAPH. | 

The PASCAL layer is presented in Appendix А апа the 
FORTRAN layer is in Appendix B. 


B. MCDULAR DESIGN ASPECTS 


The design of SYSGRAS has followed the philoscphy of 
modular design. The data structure has been designed to 
match the abstract idea of a systolic array and its rultifle 
features, subsequently it can be easily identifiarle by the 
program user. The basic element is the record "NODE". This 
element keeps all the informaticn regarding each cell. The 
whole array is a collection of NODES's, and these are assem- 
bled into a higher level hierarchy by the record "GRAPH" 
which ccntains all information about the whole array at each 


clock Cycle. 
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The part of the program that is most relevant tc the 
user refers to the cell processor support routines. If new 
types of cells need tc be simulated, new support subroutines 
must Ее added to the existing set. This can be done withcut 
complete knowledge of the program implementation because of 
the mcdularity of the design. Now we will refer specifi- 


cally to this kind of software maintenance. 


С. DESIGNING AND INSTALLING A NEW PROCESSOR  SUFPORT 
SUBRCUTINE 


The cell processcr support subroutines modify the glotal 
data structure represented by the variable G of type GRAPH. 
The existing set of such subroutines are under the comment 
titled "library routines for cell processors" in Appendix A. 
If a new one needs to be created and added, it should 
conform with the pattern seen in examples. The subroutine 
implementation must Le placed between the statement "with 
Sen ODES (.1.) do begin" and the statement 
"COLCR РЕОСЕЅУЅІМС (6,1) ". This last command calls a surrcu- 
tine that will compute the color of the ceil as a result of 
the ccmbination of the different primary colors of the input 
Пата. 

The following steps must, therefore, be followed to 


introduce a new cell subroutine: 
i. increase the constant NUMBER OF _ ROUTINES. 


ii. modify the enumerated type PROCESSOR TYPE to include 


another routine reference name (e.g. ROUTINE 9). 


iii. modify procedure DEFINE NODES to include references 
to the new sutroutine. This should be done in suck a 
way that the new routine is referred the same way as 
the others. 


iv. modify procedure OUTPUT ARRAY the same way as above. 
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v. modify procedure CORRECT_NODE as above. 


vi. modify the main program statement "case PROCESSOR 


cf" as above. 


vii. place the body of the new procedure next to the 
existing procedures іо conform with the prcgran 


structure. 
Hints for the design of the new procedure: 
- the type NODE defines the cell data structure. 


- the number of input/output ports ina cell can be 
modified by changing the constant CONNECTIONS in the 
main program. If this is done, the constant MAXIINKS 
also must be mcdified as instructed in the pregran 


text comments. 


The present implementation of SYSGRAS restricts’ the 
maximun number of cells in an array to 23, in order to 
achieve a faster computational speed. If desired, that can 
ре mcdified by altering the main program constant MAXNCDES. 
If this is done, the corstant MAXLINKS must also be modified 


aS instructed in the program text comments. 
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VII. CONCLUSIONS 


A. ESTAELISHING COMPARISON PARAMETERS 


The Simulations that have been studied in Chapter V 
provided us a reasonakle tool to use for evaluation of algo- 
rithms inglemented in systolic arrays. Important factors 


that rust te considered in this kind of an evaluation are: 


i. use of the same type of processors in other algo- 


rithms 


ii. average percentage of cells involved in effective 


computation per clock cycle 
iii. degree of homcgeneity of cells 
iv. degree of the usage of pipelining 
м. possibility of modular expandability 
Vi. required numrer of external connections to each cell 
vii. cell complexity 


Unfortunately not all these factors can be determined by 
using this tool. Subjectiveness also has to play a role in 


this evaluation. 


В. CCNCIUSTONS ABOUT ALGORITHMS UNDER ANALYSIS 


The inpiementaticn of an algorithm іп а systolic arrav 
may provide a greater calculation speed, but a cost was paid 
to get that achievement.The ccst can be evaluated in terms 
of the ccmplexity of the design, the difficulty in imuglemen- 
tation, the manufacture problems that can result Егси а 


particular array configuration, etc. We want to make sure 
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that this cost is not excessive and, ideally, to be minimum. 
These are consideraticns that require criteria of evaluation 
to be «established. Instead of trying to come up with a 
criteria that could te the subject of lengthy discussion, 
the atove mentioned factors are used to discuss the effi- 
ciency of the algorithms with a more qualitative than quan- 
titative approach. 

The first factor we want to establish is that concerning 
the average percentage of cells effectively involved in 
active ccmputation per clock cycle in each algorithm. For 
eacn algcrithm we consider the number of required cycles to 
run till ccmpletion, the number of physical cells in the 
systolic array, and the number of cells involved in active 
computation in each clock cycle. This last factor corre- 
sponds to the number of cells shown on the screen witha 


color other than black. Thus we do have: 


Matrix Triangularization: 

number of cycles = 9 

total number of celis - 9 

percentage of computation / cycle = 
= 1/9 х 1/9 х (1+2+4+5+6+5+3+1+0) = 
210293393 


Performing similar calculations for the other algo- 
rithms, we obtain the numbers shown up in Table I. 
Tefinitely, the Matrix Triangularization has a longer "duty 
cycle" per cell and this means less waste. 

The number of types of cells required for each algorithu 
is easily seen in the pictures shown in Chapter IV. Modular 
expandability 1s another important feature. It implies the 
possibility of interconnecting chips (each one embedding а 
whole systclic array) in a cascade fashion or scme cther 
arrangement where the chips are used as smaller parts of a 


higher hierarchical arrangement. Matrix  Triangularization 
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and Back Substitution have restrictions in that respect due 
to the fact that their cell arrangement is not symmetrical 
and such an exfansicn would require different types of 
Chips. | 

With respect to these factors, we can summarize in the 
following table: 


mE 
к 








| TABLE 111 | 
| Ccmparison of Algorithm Irplenentation Evaluation 
Factors | 
MATRIX | BACK | DIV. | MATRIX I 
| FEATURE TRIANG. | SUBST. | DIFF τα MUIT. | 
| E — io lm r айын odi 
Average | | | 
Pepcchtage e of | | | | 
uty cyc 0.335 0.250 Oe 250° [ 029155 | 
computing | | 
рєг Е | | | 
A] 
Nurber oí | | 1 | 
cell types | | 
— oo EE μυ. ` 7-56 
Modular | | R | T | Y | 
expandability | | | | | 
— _—eee————— | >= ὁ 
| R restricted | 
yes | 
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The number of external connections required Ey each cell 
is pehaps the most demanding factor existing in  pratical 
implementations. The normal ccnnections are power and clock 
lines. Some algorithrs like Matrix Multiplication need only 
these twc. Sone others have synchronization and data feeding 
problems that can only be solved with the addition of extra 
external connections to the cell. | This feature is somehow 


related to the degree of pipelining that can be achieved. 
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The Matrix Triangularization and the Divided Differences 
algorithms can be pratically implemented if there is a flag 
Signal that can trigger a pumping out mechanism to send the 
data frozen in each cell after the last clock cycle. To 
recover the data and display it externally, additional 
connecticns are needed in each cell. This increases the 
total ccst of the chip and complicates the proklen of 
modular expandability. Since a design goal in every VLSI 
implementation is to reduce the number of external connec- 
tions, this is a key factor in the design effort. 
Pipelining possibility is also affected because the purping 
out operation is not part of the normal algorithm cycle. It 
is an additional burden that may require the interrupticn of 
the data pumping intc and the pipelining will have to be 
restricted to data of the same problem. Data from different 
problems can not coexist at the array at the same time. A 
calculaticn of a prctlem has to finish in order to start 
another. In this sense, if we examine the Matrix 
Multiplication algorithm, we will conclude that true pigre- 
dining can be achieved. No data flow interruption occurs 
because the results are not kept frozen in the cells, tut 
are moved as part of the data flow. 

„The factor of cell complexity 15 important not only 
because cf the possible hign cost of the hardware, but also 
because cf the time that may te required for a clock cycle 
to be executed. If a longer time is required, the clock 
speed has to be kept low and the efficiency of the 
processing decreases. Simple oferations are always a gcal on 
the algorithm design tecause they result Simpler and faster 
hardware. This is why another algorithm for Matrix 


Iriangularization without Square Roots has been suggested. 
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С. SUGGESTIONS FOR FUTURE MODIFICATIONS ON SYSGRAS 


We are conscious that we have not achieved an orptiral 
design with our simulator in terms of performance. It could 
be improved if the graphics implementation were more effi- 
cient. This would speed up the interaction with the user and 
make it more attractive. Another point that cculd be 
improved is the user interface. The amount of information 
that is required from the user requires an effort that might 
be minimized if other interface techniques were used, such 
as combined use of mouse or lightpen with the keykoard 
input. 

Additicnal points that could be modified to enhance the 
presentaticn at the screen are: 

- Elimination of non significant zeros ‘from the cell 
display (such as turning 000.400 into 0.4). 

- Change of the bidirectional arrow that represents 
ccmmunications in both directions between celis into 
unidirected arrcws, one for each link between cell 
ports. This wculd make the cells appear on the screen 
the same way they are represented in the literature. 
An example of the birectional arrow can be seen in 
Fig. 4.36, in Chapter IV, connecting cells 05 and 04, 
as well as 04 ard 02. 


D. THE ROLE OF THE SIMULATOR 


Our goal was to contribute to the understanding cf the 
implementation of algorithms in systolic arrays. This led us 
to the realization cf some of the inherent problems that 
appear in this field. The complexity of the data interac- 
tion in the space-time frare is considered to be the 
greatest okstacle in the understanding process. To cecide 
on the approach to adopt to handle the problem is also 


difficult. In response to that, we designed a tool whese 


ШЕ 


task is to provide a software implementation of the systolic 
array. The user is left free to concentrate on the algo- 
rithm. We have shown the use of this tool on the study of 
some algorithms. This gave us the opportunity to realize the 
power of a systolic array in the process of perfcrming 
calculations. Aspects such as computation tine, memory 
requirements and 1/0 interactions are minimized. The mapping 
of an algorithm onto a systolic array certainly represents a 
major difficulty. Tre main role of our simulator is nct to 
help in this mapping, but it can be used in the validation 
phase of the algorithm design. The greatest contribution 
that we see in  SYSGEAS is its versatility. The user can 
model an ideal envircnment for the algorithm or an environ- 
ment contaminated by problems in the underlying hardware. 
This certainly can telp in achieving better results while 


presenting a clear idea of the robustness of an algorithm. 
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> (4NVWAD SI HOTOD 
usyy (°N 


*(, VINIOVH SI HOTOD 


usyy (°VING 
(¿MOVTE SI YOTOD 
чәЧч3 ("yə 

(¿NIIHD SI YOTOD 


1) UTSITIA 29 ANILNOY 
1) UTSITAM :G ENTZNOS 
1) UTOITIM :ῃ ENTLNOY 
1) UTSITIM τε ANILNOY 
14) UTOITIA :Z INIINOU 
i) UTSITIRA FL INILNOU 
JO YOSSAIOYd 95590 
urbeq-os[9o x) pue 
¿JUTOITIM 
VAO2°) = HOIOO JT 
¿) UTOJTIM 
MAA O DE 
¿) UTOJTIM 
ТТА) = ποιος τε 
ε)ΠΤΘΊΤΙΛ 


usyy (°NATHO*) - HOIOO FT 


: («dan Td SI YOTOD 
ueu3 (^3 

s (dY SI HO'OO 
иэчз (- 


* (¿ MOTITA SI YOTOD 


иэЧ2 (“кот 


(,aLIHM SI HOTIOO 


1s) UTORTIA 
019°) = 80122 217 
¿) UTOITIM 
Ga") = 40190 JT 
‚ } ИТэЭзтти 
τηλ) ποτοο ττ 
utbaq 95Τ9 


‚) чтәутїм uəu3 


15109 


(L 21010020? , 
: (u*xauooo', 


: (a4? SHLVNITGHOOO NS3HH2S 


: (ANO9VLIOO SI SdVHS 
: (,33vVnÓS SI GdVHS 
: (ad3HONV SI SdVHS 


(« urbe1 ueu3 ж) pue 
>UTORTIM 
<UT93TIM 

X »s)UTO ITI 
X 1») UTOITIM 
<UTO7TIA 
‚)чтәзтли 
-ЧТтәлтім 
“(ж Ә5ЕО ж) pus 

4) ЧТӘЗТІМ :409Ұ120 

1) UTOITIAM :ππτῃΌς 

1) UTOITIM >I TORY 
JO HAWHS 5559 


> (x urPəq-əsTƏ «) pue 


(,NHLHOdX3 SI NSVL 


») UT93T IA 


{6131 HILIOdXY = ASYL στ 


* (, TUNVILNI SI Я5У1 


,) UTSITIA 


ЧӘЧ ТҮМІЗІМІ = XSUL στ 


(¿YILIOAHKT SI Х571 


2 (¿ IODNIVIJATO GACIAIA SI JOSSADOMd 
* (¿NOTLNLILSINS MOVE SI HuOSSdOOHd 


чтБэ4 эзтэ 

1) UTOSITIM пәчъ 
SALHYOdWI = στα JT 
ε(κ 9550 κ] pus 


4) ЧТӘ2ТІМ 28 ANTLNOY 


4) UTSITIM τι INILNOY 


TO 


: (A40 IH SX LNONJZMAVM, 


^u 472341) 1ТӘЗТІМ 24018 
“(.04394 SI LNOYITAVM 
Dn a DTI) UTƏJTIM «πὰ 
(¿191 TE SI LNOYITAVMA 
ot OU) UTOYTIM VTE 


JO HAYIDIAS *"(*“C”)INI 95590 
. (сс: иогча ® (*г*)амт 


^, - WnLuva 47234) UTOITIM 
(ντε στο, LYOd 22934) UTSITIM 
ut baq 


Op SNOILOÓZNNOO 23 1=:г 204 


(¿YI J44na INANI 47244) UTOJTIM 
(а. С:Т’в udg8düN яаоМма”ояя)атәзтім 
urbeq 
uoqi 
0 INILNOY <> HOSSEOOHJ JI 
итбәд 
OP (*I”*)SICON*D YITM OP SICINXUWN 03 1-21 104 


атБэд 
“тәБәзітігГ”І IPA 
*( HdVH9:5 ) YALNIYd OL INdINO eanpsooad 
“(ж АЧНЧҮ LTNdino ж) pue 
(+ ЧЗЕМ x) pue 


о 


{x 9SPD x) pus 


(.М1Л49 SI LNOYVAJTAWVMa 


τ s DJU) UTƏFJTIM 284489 
- (¿1NTE SI LNOYVITAVM 

в 47248) чтәітіМ INTA 
: (4Q33 SI LNOUJZAVM, 

^a ε 058) UTSITIM > dau 

: (430VI8 SI LNOHJXAVM, 

и s DJA) UTƏJTIM IMJYTE 


Jo WNULDIdS* (°e°) TIO 955952 
AA A AO 


’, = KOLYa , ΟΠπ) ΠΤΘΙΤΙΜ 
tesa Г, ио s DJY) UTƏJTIM 
ur6əq 


OP SNOILSHINNOD 03 і-:г ΟΙ 


* (¿YI IAN LNALNO ι 288) αΤΘΊΤΙΝ 
: (6:8: Cr?) agg 
‘sca’ bie’. AUONIR 47248) атәзтім 
ор SSIHONSW TT3O 23 1=:г 103 
: (A* SHIHORSW TITO ‚“оян)цтәттлм 


* (x 103 x) pue 
(x aSed ж) pus 
(,N3H3H9 SI LNOHJSAVM, 
E a 9194) UTSITIA 2915409 


20702 


(x 53101/әрои paubtsse 11949 PUR SYUTT JO ΠΤΙ 3nd3n0 x) 


‹ ( ЯТВУТУКТТ: ТСТТЯКТТ HOA “ΠΩ 39:9 ) SANIT INdINO SANpaD9I1d 
“(ж YHILNIVA OL LNALNO ж) pus 
(ж Чата x) ραθ 


(+ итбәд UY? ж) pus 


* (¿NVAD SI HOTOO 


* (¿ULNIOYR 


(, NOV TH 


ee 


‚(лата 


: (403 


: (MOTTA 


(JALIHM SI HOTOO 


(.М1149 СІ YOTOD 


‘s (y итБэа-эста ж) 
»’OTa) чтэзтти 
иэц3з (“МУХ”) = Я80122 

SI HO'IOO 47248) чТәдтіМ 
ueu3 (*VINIOVN*) = 40125 
SI HO'IOO ι 011) UTSYTIAM 
ueu3 ("49114") = #0125 

6’ OTN) UTORTIA 
uou3 (^*N33H9*) = 821202 

SI ЧОТО? в Я) итэзтти 
ueu3 (*unI8H^) - NOIOO 

SI 40105 44234) чтәзтім 
uoeu3 (*dazxu*) = 40129 

SI HOTOO s OJU) UTƏFTINM 
uy (*MOTTAX*) = 40125 


pue 


jt 


jt 


Ft 


Ft 


utbaq 95Τ9 


(“ЧІІНА”) = 


s DZY) UTƏFJTIM uəqq 


dOTOO JT 


: (+ IOJ қ) pus 


Jem 


“ (Ч14100М21 #наүно: о лел) ONISSHDONYd ЯОТОО әшпрэоота 
* (+ SANIT LOdLNO ж) pue 
‚рә 
pue 
=: 
("xt") ISITANIT 
=2 (*1*E€*)LSITANIT 
(*i'Z")LSITXNNIT 
=: ("`X L )LSTTYXNIT 
(ж SAUTT 3O 3STT x10M oxtedald κα) 
: (L* T'. LHOd | 
96:37. ICON NOILIVNILSSG / „^у:гГ 
^, LHO4 , Z:I 2, ICON NIOIYJO ») UTOYTIM 
uthaq way 


το ο ο LNA OVEIY JT 
utbaq 


Op SNOILOSNNOO 591 |-2:T 103 
ор SS3GONXV4 O3 L-':X TOF 
Op SNOILOSNNOO O3 L-:f 190g 
OP SICINXVUN O3 L=:I TOF 
(.:( SLYOd/AGON AELDANNOD ) SANIT 40 LSIT.)UTO3TJM 
ред 
ατθθα 


s TƏ5əƏq3UuUT:X T Mu p I тел 


12H 


“(ж 102 x) pus 

(Зао Ното OO) 

> (x 9522 ж) pus 
(ма) =: РМ ТОСА ОННАН о) 
(ая) =: О» (TY 
: (*3n187*)2: Cp7)2005. 23076 
ολ: οποίος, TTA 

JO WNALDIAS”*(*C*)ANI 95590 

utba3 


ор SNOITLOANNOD 03 1=:Г 103 
:(* Se s70109 
uthaq 
op (*I*)S3d00N^9 YITM 
urboq 
* G Vq T: P 
Mog NIVM JO (*qudT^)Áeare 240002 ILA 


І әроп 10 #07105 03 1τπ591 забтз$е рир 


Х9814 - ХОРТ4 ж ХОРТА 


uoryeurqmoS 2843 = чотариташоо Kue + YorTy 


әлтцм - рәі ы ЧӘӘІБ + ONT 
МОТТӘА - Par а чәәл6 
ЧРАО - ЧӘӘІБ + эта 
руцәБеш = рәт + әпт1 


“MOT2 5Р <лотоо Аде штлїа yndut SƏXTU x) 
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("NOVIA *) == MOTOD 

- (^ N3315*) =2YOTOD 

: ("3018 ^) 2: 80102 

< (“4487) -24101002 

: (”MOTTIA") =: 40709 
: ("ELNIDVN ") =: ЧОТО? 
> (°NVWAD°) =: YOTOD 

: (C H8LIHM^) 2: HO'102 


(ж 


итБэд 


op (*L°) SACON°S YRTA 


urbə1 
(2 ec (ir) 
AO == Е) 
СОСЕ (Ст) 5 
(СҮЗ) ПО <== (Ε”τ) ο 
(Ci nl < = τ ης 
("7") С == ОЕ 
ПОНЕ == СЕС) э 


:ST элизетриэшойи рэзЧоре x) 


*TXNUHHINI SNHAT95 Ər1npə5od41d 


(4 SIOSSAIDOIA [19D 10ј зэчтзпол Алелчтт ж) 


чәч з 
ueqa 
us y? 
uəuq 
uəy} 
uəuq 
uəuq 
ueqa 


* (x 9NISS3OOHd HOTOO ж) pus 

(ж Чати ж) pus 

ЧОЛО ЧОНУ FE 

J0OTOO -» (^N33H9*) FT 

80102 -> (730147) ІТ 

MOTOO ОН) TE 

YOTOD => ("CIU “NIAYO *) JT 
JOTOD => (*114*4NTA*) JT 

YOTOD => ("NITO GNTA +) JT 
YOTOD => ("ARABE NITO MANTA *) FT 
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utbaq esta 


pus 


ο ΠπΙτατ σας 
L=: NANIA" (L 00 
итБэа ueu3 


0*0 = RALVUI* (*1*)aNI ІТ 


0 
:0* 


итБэа 
Op ("TI *)SIdON*"D UITA 
итбэд 
(ж ИИ 
(727) 112 <-- ([*t)s 
[< Ш n o 
(°L-)aNI <== (C'r)z 


¿ST 3INFP[DUSUOU poydope κ) 

'TVNHdLX3 SNZAIS 9dnpoooid 

* (« TVNHILNI SNSAIO x) pue 

(ж Чат” x) pue 
(179) ӨМІ55Я20Ч44 YOTOD 
WOLVa* (°°) dNIxNOLVA" ("2 7) aNI+ ("L 7) NIRxKOLYG” ("1 7) aNI=: ("LL") HEN 
“ИОЧОЧа$ ° ( *5 °) амт=:иочтота$ ° (*5 °) ПО 
2 ("1") NIRAHNIVA* (*Z*)dNI 

-&nLvgq*(^e)aur«Hnivca* CL ^»)aNrs:gnLva^(^e7)ino 
ras Irc aso 
(t 7”) INI=:("L 7") INO 


Шал 


“1204084 YANNI элорэроза 
* (« T1820 HdddfüH ж) pus 
“(к Ч218 ж) pue 
: (179) ONISSIDOYA YOTOD 
:Nnzvg^(*e^)aNr-:(^&") kaH 
“ИО Ча * (2 ^) амт=: (*2*) ичи 
:NnnLvdga* Cu) akrzs: C L7) изн 
(-5*) амт=: (*5^) 100 
νοις. 
а NO 
итбәд ор (°т°) ѕяаок 9 ҷэта 
итбэд 


‘TIID ἅπαάηα θαππρδοοπᾶ 
“(ж TVNHILXS SNSAI9 ж) puo 
(ж ҷатмж) роә 
(19) ONISSIDOYA YOTOD 
“HNYULOIAS* (*L *)INT=:NNALDAAS*(*Z *)INO 
hidiogas* (+1 5) ант иочгояас (1 -) тпо 
‘(x urbeq-esp?» x) pue 
( (NOLYT" (”L 7) акт) 305+ ( (*1^) ити) #05) тн05=: (*1 *} ичи 
- ( (пота * (1 >) акт) 405 + 
((°L°) WOW) HOS) DYOS/NNALVaA’ (°L °) dNI=:WNLYa°(°Z°) ТПО 
$ ( (нога * (1 *) ант) 305 + 
(πι) παν) 9405) 190s/ (*1 *)uan==w05va" (-1 *) 100 
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(Елш тес (τὶς 
Пе Е 
ПС ШУ 
ST 9MINIP]2USMOL ж) 
*'OM INI SNZAIS o1inpsooid 
“(ж 1000084 YANNI κ) pus 
* (x (31^ x) puo 
: (I^9) 5NISS3DOHd 40709 
:WnyLodds*(^e£^)aNris':unuiodds ^(^€7) nno 
(1) RIN="HWNIVI* (*E*) ПО 
:"Nndvga* (0 σ ) ἄντα μηνα (σι ) ANI+WOLYA" ("€") dNI=2 (°°) WOH 
οι ο απ ο ο πο 
"use damos NO 
urbeq op (^I^)sS3d0ON^9 чзти 


utbaa 
(ж (Tha « (т)ү + (т) ә ==> (ο)2 srt uquaraofrie 
(717) КІН <-- (О)О 
(τε σου <--. (039 
(7273110 <-- (о)я 
( ngo ecco) 
E о 
(27)4МІ <-- (τ)α 
(0) а ==) у 


Ѕт эїпзъТтопәшоп x) 
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(¿akapa == (T) X 
(Poje Nr <S (T)THQ 
ST ə1Tn1eTOuəuoUu κ) 
“ΟΜ ІҮЗ 6ЮЯЛІ9 әшпрэоота 
* (x OM LNI SNSAIO x) pus 
* (x 431^ x) pue 
: (r'9) SNISS320HUd HO'IOO 
'Nnivda* (*&8*)aRrs:unzva^ (^n) ino 
ПОЕТО Е а Чит УСС) пе 
осанки) топо 
"Иоча ("1 °) амтжиптча" ("Е °) амі+ (1°) изижиптча ° (2°) амі=: ("1 *) изи 
:WNügiouds*(*,7)aur-:unuioais ^ (^47) L00 
> (°L°) WOWaWONLVG*? (787) 4аМІ-ИЯПІМа” (е,%) амі-2ИПІМ17(717) 100 
utbeq op (*I^)sS30ON^9 uar^ 
urbaq 


(1)X«(1)S*«ux(1)25--» Uu 
Яж (7) z- (1) x--» (o) x 
ST ΠΠλτΙοΡΤτο 


("LARIN <== N 
(o) lO < =. (o)? 
πιο <== (o)s 
Ceno <s O 
τι τις <== 10) x 
Ce PINI <=- 2 


TO 


usyz ((o°O0=NNEWa* (°L°) aNI) πο 


(ж 


urboq 


(0*0-NüLVGQ*(*z^)dNI)) Jt 


urbaq op (^I^)S3G0ON^9 Чата 


(т) тзаж (о) 2--> (о) тча 


44--> 
(т) х--> 
аи (т) Хж (Т) 141--> 
44/а-->» 


Cxx (T) X« (1) 13040--» 


итБэд 

jtpue 
A 
(0) z 
(0) 5 
(0) 9 
sa 

ƏS TƏ 


(т)тяа-->› (о) тя1 


(т) х--> 
0--> 
L--» 


(o) z 


(0) 5 


(°) 5 


wey} 0=(1) 190 20 0= (т) х JT 


эт шцатсовтве 


(cc unio sa 
CS NGHE q 
Ao = (ο) 
(oS IE (ο)ς 
(Сосо = (о) 2 
ОЕ (озна 


SAL 


> (°2°) WAN/WONLVO °(°7*) амтжиогча * (*1 >) амт= 


5) 


CS nn 
(*£*)dNI 
(°Z°) ANI 
(-1 >) акт 

ST 


(« OM LXI 


"ία Чут x) pue 


(τ) χ 
(T) X 
(T) g 
(TT) W 


A A 
l Il 
коп 


< 
< == 


oInjv?roueuou x) 


NOITLOLIZSINS MOvVg einp»2oid 


SNZAIO ж) 


9NISSd201d HOTOO 

> (x OSTS ж) pus 
:'Nnyuioads* (CCi7)aNrs:unuiozsds^(*i7) 50ο 
>WALWO°(°L°) dNI*WNLva’ (°7°) 4n00-:u4nnz va^ (^17) ПО 


(“Z2) nan=:(*1 *) ЯЫ 
("2") aNI=: ("fr") LO 
Коча" (°С =) ino 


(с) КАНИ (1) НЯН= RNA? (*Z*) 200 
МОЧА" (2°) амт 

*R Ча ® (“2”) dNTa WN ova’ (ob *) aNT+ (ob?) WIR=: (°7°) NIN 
urbaq esta 


6 
a 
6 
e 


.- 


pue 


ИИН ο το 
a Card. (ots a © 
-:univa*(^e-) nno 
πηγα τς. 


puo 


ie 


ST wyytrobte 


noes == Ел 
(SENADO < 
CASAN == (т)х 
= (ш+т) х 
(ан ИОС ЗЕЕ А 
(FS) INT == (E rT) K 
(о) аис = (ш+т) X 
РН a= (r) X 


ST Ə1n3 PeTOUƏDUOU ж) 


“42841414411 АЗаІлІСП әшпрэоо13 
(% МОТІПІІІ5805 WOVE x) puə 
‘(+ Чата x) pue 
* (119) SNISSZDO0Ud #0105 
Wngiosds*(*z^)dNr-:uniliosdds |(^L7) INO 
Ели (ek =) EDO 
:Q*0Os: (717) RHK. esto 
NHüabva*(*tL7)aur/( wnniva^(^e^)aNr-kgnuva^(^*zc^)aNI )s2:(71 ^) N3H 
uou3 ( 0*0 <> Wotvae(*t*)adNI ) στ 
итбәд ор (717)5500М879 YIT” 
итБәд 
(r')0v/ € (0z-(0)a-) --» (0x 
ST UUYITIOBTE 


(°L°) NOW <== (т)х 
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ор ("τ")5π40Ν 9 YUITA OP SACONXVN 09 L=2I 103 
urbaq 
. тэБэзит:т 
*8'^L* 310230 10D IPA 
*( udvH9:9 ) N33HOS HSH3HJS3H әлпрәоозта 


(x uəəd525s uo Kerrie Kerisrtrp o3 seurino1 x) 


“(ж SONS3HS4JJIC Q3QIAIG ж) рчэ 
: V3IÀ x) pue 
* (I'9) ONISSIDOYA YOTOD 
'Nnynogds^*(*z^)gn00-:unuzogis ^ (^t^) uno 
:Wügnosgas^(*i7*)in0-2:gfüulO3ds^ (^c*)ino 
-(°L *) WAIN="WOEVa° (°°?) 100 
(*L°)WOW=snAnLYa’ (°E°) 00 
> (WOEVa? (°L°) dNI-WOALVa? (°7°) ANT) 
/(WALYG? (°e°) ANI-WAIVA? (ch?) AUNT) =: (°L P) NIN este 
(,LdaM SI LINSAY SNOTATYd “a 
“221%, ION LY TTYNS OOL SI ант Хх чьтча,)чтэзтла чэчз 
(1000009 “0 > (иотча> (*1>) амт - нотча * (-2 *) акт) 4а®) 3r 
πι τν NO 
(| За (c°) ПО 
utbeq ор (*I*)SI00N*D UJTA 
итбэд 


(= (т) х- (ш+т)х*) /(* (£*T1) 2-(L*1+1)2%) --> (L+[*T)A 


134 


woud ( 0 ANILNOY <> YOSSIDOYUA ) JT 
OP (*I*)SIC00N*D YITM OP SICONXVUN 07 l|=31 103 
(x yoeTq *) :L[2:80102H0102 
uTb»3 
: ISbD93UT: 9NIINIOG'I 
8 * "1390930105 1Р^ 
*( S IH VLNNIT:LSITMSNIT *HdVH9:9 ) NZE3SHOS SZITIVILINI einpoo2oid 
` (ж NHgZHOS HSHZHIS3N κ) pue 
(x USYyI-JT ж) pue 
-((°L°) RAN“ ACHYOOD’XGYOO)) THOTIA 
: (+ JAVHS esed ж) риә 
< (лая002 ”хаЯя002“440240102) NO5120:N209V120 
*(xquooo^xquooo 3d0240102) σαπτηῦς :zwvnós 
JO gdVHS 95590 
:8=2 3002407109 U9q3 ((*ILIHM”*) = YOTOD) IT 3STO 
L-*3d024u0102 ueqga3 ((*4o0oTTI3A^) = YOTOD) JT 9STO 
9=*IC0DYOTOD Uueu3 ((^VLIN39VE^) = YHOTOD) JT 9STO 
g-:3d]02NHOTOD Uueu3 (("NYAD”*) = YOTOD) JT 95τ9 
n-:3dO2UOT102 ueu3 ((*sznT8^) = ποτοο) ІТ este 
£-:3d02U0T100 Uueu3 ((*Na33U9*) = YOTOD) əsTə 
c-:43002HO0'TIOO ueu3 ((*aqa4-*) = ποτοῦ) 


F 
L-2:3d02HOT102D ueu3 ((*wovI8^) - 807002) зт 
итбәд 


чы 
ed “ἡ 


əsTə 


uəuy ( O0 ANIZNON <> YOS3IDOYA ) JT 
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urbeq 
ор (“атчаок”) SEdON ^9 чэт“ 
: (II3GON) urpeoi 
(ΟΝ 9NOZM 40 ἩΠΗ͂ΝΗΝ GI HÓILNZ.)UTO94TIM 
urboq 
«19θβ91ᾳτ:ΠΠΟΟΝΠΝ 
: Iey2: 31023 9N VEO 
: ILd3qdON:JIASN'GISGON ІРА 
* ( наччо:9 лел ) AGON LIIUEOD SANP3DOIH 
“(ж 8484426 S3ZIIVILINI ж) pue 
“(Лая0027 (“(21”Е7) ІСІТУМІТ")5300879 
^xauoo2* (* CI'&^) ISI TXNIT^) S30ON “9 
АООТ Е ILSITYNIT")SIGON"9 
*XaYHOo0o9* (* ("IL *)ISITANIT*)SICON*D) MOYUY 
OP ЯИПКУКІТ 07 l=:1 103 
‹ (« ueu3-7jt ж) pue 
*(oNriNIOd!aqdHOO2'XQHOOO) ткзат 
:I-:9NILNIOd 
-((°L°*)WaW’ AGYOOD’XaHOOD) тяотт4 
> (x HIAVHS 9550 κ) pue 
: (axdau0052^xqsioo02'/380230102) NO9120:NO9Y120 
* (aquooo^xauooo'zao2uoTOO) aasvnós σαστηῦς 
jo ddVHS 9520 
utbaq 


Е 


0 INIZNOA= :YOSSIDOMA 

* JAVHS=*IAVHS”* (*CIMAN*)SICON*O 
:Aq80022:1qHu002*(*dIMÓN^)SHGON^9 
*X1400022:X(03002* (*GIMIN*)SS3GON^9 
* NSYb-2:NSVL^ ((TIMIN^) σπαοκ”9 
*Y¥OSSHIOUd=24OSSTIONd*® (°CIMAN*) SECON °D 
* YOTOD=:YOTOD* (*CIMAN*)SICON*O 
>LNO=:LNO* (°CIMAN*) SECOND 
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JO Παοσμαν 9550 
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* (TIODMAIN) UTPRo1 


«το < 109520 1) UTOITIA 
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utTb3q 

ор daXId SI ἈΝΤΊ 30U STTYM 
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utbaq 
* (x 4NII LO23HNOO ж) pue 
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